-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CompatHelper: bump compat for "CUDA" to "5" #155
CompatHelper: bump compat for "CUDA" to "5" #155
Conversation
Benchmark results for commit dedaad6 (comparing to bca8c30):
|
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## master #155 +/- ##
=======================================
Coverage 32.17% 32.17%
=======================================
Files 11 11
Lines 889 889
=======================================
Hits 286 286
Misses 603 603 ☔ View full report in Codecov by Sentry. |
The change in registers is due to the removal of a quirk in JuliaGPU/CUDA.jl@760c2bd#diff-d1deb39259ffe01ecfe5db7c6c2331af73b81772539dd8ef49076d58fa2529d2, but AFAICT that actually improves performance. I still have to find the culprit of the other slowdowns, but I can't reproduce it locally. |
16416fe
to
47ee618
Compare
Painfully bisected to JuliaGPU/CUDA.jl#2025. Which doesn't make sense, as we manually synchronize the stream, and thus do not rely on non-blocking synchronization. The only point nonblocking synchronization is used, is when allocating the arrays during the set-up of each benchmark. Maybe we're just introducing lots of noise there right before each benchmark? Getting rid of that seems to greatly improve timings, so hopefully that fixes the issue here. |
47ee618
to
dedaad6
Compare
This pull request changes the compat entry for the
CUDA
package from3.5, 4
to3.5, 4, 5
.This keeps the compat entries for earlier versions.
Note: I have not tested your package with this new compat entry. It is your responsibility to make sure that your package tests pass before you merge this pull request.